Negation scope and spelling variation for text-mining of Danish electronic patient records
نویسندگان
چکیده
Electronic patient records are a potentially rich data source for knowledge extraction in biomedical research. Here we present a method based on the ICD10 system for text-mining of Danish health records. We have evaluated how adding functionalities to a baseline text-mining tool affected the overall performance. The purpose of the tool was to create enriched phenotypic profiles for each patient in a corpus consisting of records from 5,543 patients at a Danish psychiatric hospital, by assigning each patient additional ICD10 codes based on freetext parts of these records. The tool was benchmarked by manually curating a test set consisting of all records from 50 patients. The tool evaluated was designed to handle spelling and ending variations, shuffling of tokens within a term, and introduction of gaps in terms. In particular we investigated the importance of negation identification and negation scope. The most important functionality of the tool was handling of spelling variation, which greatly increased the number of phenotypes that could be identified in the records, without noticeably decreasing the precision. Further, our results show that different negations have different optimal scopes, some spanning only a few words, while others span up to whole sentences.
منابع مشابه
Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish
Access to reliable data from electronic health records is of high importance in several key areas in patient care, biomedical research, and education. However, many of the clinical entities are negated in the patient record text. Detecting what is a negation and what is not is therefore a key to high quality text mining. In this study we used the NegEx system adapted for Swedish to investigate ...
متن کاملAnnotation of negation in the IULA Spanish Clinical Record Corpus
This paper presents the IULA Spanish Clinical Record Corpus, a corpus of 3,194 sentences extracted from anonymized clinical records and manually annotated with negation markers and their scope. The corpus was conceived as a resource to support clinical text-mining systems, but it is also a useful resource for other Natural Language Processing systems handling clinical texts: automatic encoding ...
متن کاملExtraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...
متن کاملText Data Mining of In-patient Nursing Records Within Electronic Medical Records Using KeyGraph
This research used a text data mining technique to extract useful information from nursing records within Electronic Medical Records. Although nursing records provide a complete account of a patient’s information, they are not being fully utilized. Such relevant information as laboratory results and remarks made by doctors and nurses is not always considered. Knowledge concerning the condition ...
متن کاملSome Aspects of Negation Processing in Electronic Health Records
The presented paper discusses a hybrid approach for negation processing in Electronic Health Records (EHRs) in Bulgarian. The rich temporal structure and the specific combination of medical terminology in both Bulgarian and Latin do not allow the application of standard language processing techniques. The problem gets even worse due to the often use of specific abbreviations, analyses and clini...
متن کامل